Knowledge-Based Visual Question Answering Using Multi-Modal Semantic Graph
نویسندگان
چکیده
The field of visual question answering (VQA) has seen a growing trend integrating external knowledge sources to improve performance. However, owing the potential incompleteness and inherent mismatch between different forms data, current knowledge-based (KBVQA) techniques are still confronted with challenge effectively utilizing multiple heterogeneous data. To address this issue, novel approach centered on multi-modal semantic graph (MSG) is proposed. MSG serves as mechanism for unifying representation data diverse types knowledge. Additionally, reasoning model (MSG-KRM) introduced perform deep fusion image–text information sources. development involves extracting keywords from image object detection information, text, texts, which then represented symbol nodes. Three graphs constructed based graph, including vision, question, non-symbol nodes added connect these three independent marked respective node edge types. During inference stage, embedded into feature through embedding methods, type-aware attention module employed reasoning. final answer prediction blend output pre-trained model, pooling results, characteristics non-symbolic experimental results OK-VQA dataset show that MSG-KRM superior existing methods in terms overall accuracy score, achieving score 43.58, improved most subclass questions, proving effectiveness proposed method.
منابع مشابه
Constraint-Based Question Answering with Knowledge Graph
WebQuestions and SimpleQuestions are two benchmark data-sets commonly used in recent knowledge-based question answering (KBQA) work. Most questions in them are ‘simple’ questions which can be answered based on a single relation in the knowledge base. Such data-sets lack the capability of evaluating KBQA systems on complicated questions. Motivated by this issue, we release a new data-set, namely...
متن کاملMulti-Modal Question-Answering: Questions without Keyboards
This paper describes our work to allow players in a virtual world to pose questions without relying on textual input. Our approach is to create enhanced virtual photographs by annotating them with semantic information from the 3D environment’s scene graph. The player can then use these annotated photos to interact with inhabitants of the world through automatically generated queries that are gu...
متن کاملKnowledge-Based Question Answering
This paper describes the Webclopedia Question Answering system, in which methods to automatically learn patterns and parameterizations are combined with hand-crafted rules and concept ontologies. The source for answers is a collection of 1 million newspaper texts, distributed by NIST. In general, two kinds of knowledge are used by Webclopedia to answer questions: knowledge about language and kn...
متن کاملQuestion Answering System Using Semantic Dependency Tree and State Graph
The basic architecture of a Question Answering System (QAs), based on Natural Language Processing, subsumes question analysis and answer extraction. The paper presents a system which is based on semantic analysis, relates the words logically and provides an admissible answer to the user query. Instead of using template based query, it accepts questions phrased in various forms. The question is ...
متن کاملKnowledge Based Question Answering
The n a t u r a l language d a t a b a s e query system i n c o r p o r a t e d in the KNOBS i n t e r a c t i v e p l a n n i n g sys t em compr i ses a d i c t i o n a r y d r i v e n p a r s e r , APE-II , and s c r i p t i n t e r p r e t e r which y i e l d a c o n c e p t u a l dependency c o n c e p t u a l i z a t i o n as a r e p r e s e n t a t i o n of the manning of u s e r i n p u ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2023
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics12061390